{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Downloading Continuous Data\n",
    "This notebook demonstrates the use of EQTransformer for downloading continuous data and for make hdf5 files . \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from EQTransformer.utils.downloader import makeStationList, downloadMseeds"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use help() to learn about input parameters of each fuunction. For instance:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Help on function makeStationList in module EQTransformer.utils.downloader:\n\nmakeStationList(json_path, client_list, min_lat, max_lat, min_lon, max_lon, start_time, end_time, channel_list=[], filter_network=[], filter_station=[], **kwargs)\n    Uses fdsn to find available stations in a specific geographical location and time period.  \n    \n    Parameters\n    ----------\n    json_path: str\n        Path of the json file that will be returned\n    \n    client_list: list\n        List of client names e.g. [\"IRIS\", \"SCEDC\", \"USGGS\"].\n                                \n    min_lat: float\n        Min latitude of the region.\n        \n    max_lat: float\n        Max latitude of the region.\n        \n    min_lon: float\n        Min longitude of the region.\n        \n    max_lon: float\n        Max longitude of the region.\n        \n    start_time: str\n        Start DateTime for the beginning of the period in \"YYYY-MM-DDThh:mm:ss.f\" format.\n        \n    end_time: str\n        End DateTime for the beginning of the period in \"YYYY-MM-DDThh:mm:ss.f\" format.\n        \n    channel_list: str, default=[]\n        A list containing the desired channel codes. Downloads will be limited to these channels based on priority. Defaults to [] --> all channels\n        \n    filter_network: str, default=[]\n        A list containing the network codes that need to be avoided. \n        \n    filter_station: str, default=[]\n        A list containing the station names that need to be avoided.\n    \n    kwargs: \n        special symbol for passing Client.get_stations arguments\n    \n    Returns\n    ----------\n    stations_list.json: A dictionary containing information for the available stations.\n\n"
     ]
    }
   ],
   "source": [
    "help(makeStationList)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1) Finding the availabel stations "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Defining the network, station, location, channel and time period of interest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "NET = \"CI\"\n",
    "STA = \"BAK,ARV\"\n",
    "LOC = \"*\"\n",
    "CHA =\"BHZ\"\n",
    "STIME=\"2020-09-01 00:00:00.00\"\n",
    "ETIME=\"2020-09-02 00:00:00.00\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This will download the information on the stations that are available based on your search criteria. You can filter out the networks or stations that you are not interested in, you can find the name of the appropriate client for your request from here:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "CI--ARV\nCI--BAK\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "json_basepath = os.path.join(os.getcwd(),\"json/station_list.json\")\n",
    "makeStationList(json_path=json_basepath,\n",
    "                  client_list=[\"IRIS\"],\n",
    "                  min_lat=None, max_lat=None, min_lon=None, max_lon=None, \n",
    "                  network=NET,\n",
    "                  station=STA,\n",
    "                  location=LOC,\n",
    "                  channel=CHA,                      \n",
    "                  start_time=STIME, \n",
    "                  end_time=ETIME,\n",
    "                  filter_network=[])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A json file should have been created in the json path. This contains information for the available stations (i.e. 2 stations in this case). Next, you can download the data for the available stations using the following function and script. This may take a few minutes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2) Downloading the data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can define multipel clients as the source:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stderr",
     "text": [
      "[2020-10-13 19:17:43,089] - obspy.clients.fdsn.mass_downloader - INFO: Initializing FDSN client(s) for IRIS.\n",
      "[2020-10-13 19:17:43,093] - obspy.clients.fdsn.mass_downloader - INFO: Successfully initialized 1 client(s): IRIS.\n",
      "[2020-10-13 19:17:43,096] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0\n",
      "[2020-10-13 19:17:43,097] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Requesting reliable availability.\n",
      "####### There are 2 stations in the list. #######\n",
      "======= Working on ARV station.\n",
      "[2020-10-13 19:17:44,003] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Successfully requested availability (0.91 seconds)\n",
      "[2020-10-13 19:17:44,009] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Found 1 stations (3 channels).\n",
      "[2020-10-13 19:17:44,010] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Will attempt to download data from 1 stations.\n",
      "[2020-10-13 19:17:44,010] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Status for 3 time intervals/channels before downloading: EXISTS\n",
      "[2020-10-13 19:17:44,013] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - No station information to download.\n",
      "[2020-10-13 19:17:44,014] - obspy.clients.fdsn.mass_downloader - INFO: ============================== Final report\n",
      "[2020-10-13 19:17:44,014] - obspy.clients.fdsn.mass_downloader - INFO: 3 MiniSEED files [25.1 MB] already existed.\n",
      "[2020-10-13 19:17:44,015] - obspy.clients.fdsn.mass_downloader - INFO: 1 StationXML files [0.0 MB] already existed.\n",
      "[2020-10-13 19:17:44,015] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Acquired 0 MiniSEED files [0.0 MB].\n",
      "[2020-10-13 19:17:44,015] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Acquired 0 StationXML files [0.0 MB].\n",
      "[2020-10-13 19:17:44,016] - obspy.clients.fdsn.mass_downloader - INFO: Downloaded 0.0 MB in total.\n",
      "** done with --> ARV -- CI -- 2020-09-01\n",
      "[2020-10-13 19:18:10,043] - obspy.clients.fdsn.mass_downloader - INFO: Total acquired or preexisting stations: 0\n",
      "[2020-10-13 19:18:10,044] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Requesting reliable availability.\n",
      "======= Working on BAK station.\n",
      "[2020-10-13 19:18:10,556] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Successfully requested availability (0.51 seconds)\n",
      "[2020-10-13 19:18:10,562] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Found 1 stations (3 channels).\n",
      "[2020-10-13 19:18:10,563] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Will attempt to download data from 1 stations.\n",
      "[2020-10-13 19:18:10,564] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Status for 3 time intervals/channels before downloading: EXISTS\n",
      "[2020-10-13 19:18:10,567] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - No station information to download.\n",
      "[2020-10-13 19:18:10,567] - obspy.clients.fdsn.mass_downloader - INFO: ============================== Final report\n",
      "[2020-10-13 19:18:10,568] - obspy.clients.fdsn.mass_downloader - INFO: 3 MiniSEED files [50.4 MB] already existed.\n",
      "[2020-10-13 19:18:10,568] - obspy.clients.fdsn.mass_downloader - INFO: 1 StationXML files [0.0 MB] already existed.\n",
      "[2020-10-13 19:18:10,568] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Acquired 0 MiniSEED files [0.0 MB].\n",
      "[2020-10-13 19:18:10,569] - obspy.clients.fdsn.mass_downloader - INFO: Client 'IRIS' - Acquired 0 StationXML files [0.0 MB].\n",
      "[2020-10-13 19:18:10,569] - obspy.clients.fdsn.mass_downloader - INFO: Downloaded 0.0 MB in total.\n",
      "** done with --> BAK -- CI -- 2020-09-01\n"
     ]
    }
   ],
   "source": [
    "downloadMseeds(client_list=[\"IRIS\"], \n",
    "          stations_json=json_basepath, \n",
    "          output_dir=\"downloads_mseeds\", \n",
    "          start_time=STIME, \n",
    "          end_time=ETIME, \n",
    "          min_lat=None, max_lat=None, min_lon=None, max_lon=None,\n",
    "          chunk_size=1,\n",
    "          channel_list=[],\n",
    "          n_processor=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above will download the continous data (either in MiniSeed or SAC) and save them into individual folders for each station insider your defined output directory (i.e. downloads_mseeds)."
   ]
  },
  {
   "source": [
    "# HDF5 maker"
   ],
   "cell_type": "markdown",
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "Help on function preprocessor in module EQTransformer.utils.hdf5_maker:\n\npreprocessor(preproc_dir, mseed_dir, stations_json, overlap=0.3, n_processor=None)\n    Performs preprocessing and partitions the continuous waveforms into 1-minute slices. \n    \n    Parameters\n    ----------\n    preproc_dir: str\n        Path of the directory where will be located the summary files generated by preprocessor step.\n    \n    mseed_dir: str\n        Path of the directory where the mseed files are located. \n    \n    stations_json: str\n        Path to a JSON file containing station information.        \n        \n    overlap: float, default=0.3\n        If set, detection, and picking are performed in overlapping windows.\n           \n    n_processor: int, default=None \n        The number of CPU processors for parallel downloading.         \n    \n    Returns\n    ----------\n    mseed_dir_processed_hdfs/station.csv: Phase information for the associated events in hypoInverse format. \n    \n    mseed_dir_processed_hdfs/station.hdf5: Containes all slices and preprocessed traces. \n    \n    preproc_dir/X_preprocessor_report.txt: A summary of processing performance. \n    \n    preproc_dir/time_tracks.pkl: Contain the time track of the continous data and its type.\n\n"
     ]
    }
   ],
   "source": [
    "from EQTransformer.utils.hdf5_maker import preprocessor\n",
    "help(preprocessor)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      " *** \" downloads_mseeds \" directory already exists!\n",
      "============ Station ARV has 1 chunks of data.\n",
      "============ Station BAK has 1 chunks of data.\n",
      "  * ARV (1) .. 20200901 --> 20200902 .. 3 components .. sampling rate: 100.0\n",
      "  * BAK (1) .. 20200901 --> 20200902 .. 3 components .. sampling rate: 100.0\n",
      " Station ARV had 1 chuncks of data\n",
      "2056 slices were written, 2057.0 were expected.\n",
      "Number of 1-components: 0. Number of 2-components: 0. Number of 3-components: 1.\n",
      "Original samplieng rate: 100.0.\n",
      " Station BAK had 1 chuncks of data\n",
      "2056 slices were written, 2057.0 were expected.\n",
      "Number of 1-components: 0. Number of 2-components: 0. Number of 3-components: 1.\n",
      "Original samplieng rate: 100.0.\n"
     ]
    }
   ],
   "source": [
    "preprocessor(preproc_dir=\"preproc\",mseed_dir='downloads_mseeds', stations_json=json_basepath, overlap=0.3, n_processor=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "name": "Python 3.7.9 64-bit ('eqt': conda)",
   "display_name": "Python 3.7.9 64-bit ('eqt': conda)",
   "metadata": {
    "interpreter": {
     "hash": "9cb62e47ce4305e592b1acbd685a23ac343ebc53b26a5150204726fc6e8485b2"
    }
   }
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.9-final"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}